Codec integrated voice conversion for embedded speech synthesis

نویسندگان

Guntram Strecha

Oliver Jokisch

Matthias Eichner

Rüdiger Hoffmann

چکیده

Voice conversion technologies transform individual characteristics of speech patterns while preserving the original content, and can be widely used in speech processing. Considering limited system resources, in particular, of embedded concatenative speech synthesis, voice conversion may reduce the memory consumption of the acoustic database. Voice conversion enables the intra-gender or cross-gender generation of new voices by using an existing high-quality voice. Usually, voice conversion is based on modification of spectral properties in accord with pitch manipulation. Warping functions in the frequency domain aiming at a reverse vocal tract length normalization (VTLN) is a simplified approach. Consequently, voice conversion itself generates a critical calculation complexity which contradicts the practical constraints of typical embedded and mobile applications. The authors propose a novel approach for voice conversion by re-using features of a common speech codec. Such a codec is already available in typical mobile applications and the resulting voice quality is widely accepted. The paper investigates the manipulation of the immittance spectral frequencies (ISF) provided by the Adaptive Multi Rate Wideband codec (AMR-WB). This algorithm has been integrated into the embedded speech synthesizer microDRESS.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Evaluation of VTLN-based voice conversion for embedded speech synthesis

Recently, we demonstrated that vocal tract length normalization (VTLN) can be applied to voice conversion tasks. In particular, when the conversion algorithm is performed in time domain, this technique is very resource-efficient and, consequently, suitable for embedded applications. In this paper, we use VTLNbased voice conversion as a novel feature of a small footprint speech synthesizer runni...

متن کامل

Embedded system for speech recognition and image processing

In recent years, the products of voice terminal and image retrieval show the intelligentized trend, but the mature commodities are rare in the market. This paper presents an embedded design method of intelligent voice terminal based on pattern recognition. The design adopts Samsung S3C2410 ARM as target board, Philips Uda1341TS as audio codec, embedded Linux OS as software platform, and speech ...

متن کامل

Duration-embedded bi-HMM for expressive voice conversion

This paper presents a duration-embedded Bi-HMM framework for expressive voice conversion. First, Ward’s minimum variance clustering method is used to cluster all the conversion units (sub-syllables) in order to reduce the number of conversion models as well as the size of the required training database. The duration-embedded Bi-HMM trained with the EM algorithm is built for each sub-syllable cl...

متن کامل

Study on Unit-Selection and Statistical Parametric Speech Synthesis Techniques

One of the interesting topics on multimedia domain is concerned with empowering computer in order to speech production. Speech synthesis is granting human abilities to the computer for speech production. Data-based approach and process-based approach are the two main approaches on speech synthesis. Each approach has its varied challenges. Unit-selection speech synthesis and statistical parametr...

متن کامل

طراحی یک روش آموزش ناموازی جدید برای تبدیل گفتار با عملکردی بهتر از آموزش موازی

Introduction: The art of voice mimicking by computers, has with the computer have been one of the most challenging topics of speech processing in recent years. The system of voice conversion has two sides. In one side, the speaker is the source that his or her voice has been changed for mimicking the target speaker’s voice (which is on the other side). Two methods of p...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2005

Codec integrated voice conversion for embedded speech synthesis

نویسندگان

چکیده

منابع مشابه

Evaluation of VTLN-based voice conversion for embedded speech synthesis

Embedded system for speech recognition and image processing

Duration-embedded bi-HMM for expressive voice conversion

Study on Unit-Selection and Statistical Parametric Speech Synthesis Techniques

طراحی یک روش آموزش ناموازی جدید برای تبدیل گفتار با عملکردی بهتر از آموزش موازی

عنوان ژورنال:

اشتراک گذاری